Systems Biology Approaches to Mining High Throughput Biological Data
نویسندگان
چکیده
With advances in high throughput measurement techniques, large-scale biological data have been and will continuously be produced, for example, gene expression data, protein-protein interaction (PPI) data, tandem mass spectra data, microRNA expression data, lncRNA expression data, and biomolecule-disease association data. Such data contain insightful information for understanding the mechanism of molecular biological systems and have proved useful in diagnosis, treatment, and drug design for genetic disorders or complex diseases. For this focus issue, we have invited the researchers to contribute original research articles which develop or improve systems biology approaches to mining high throughput biological data. With high throughput data, it is appealing to develop systems biology approaches to understand important biological processes. In the paper " Differential Expression Analysis in RNA-Seq by a Naive Bayes Classifier with Local Normalization, " Y. Dou et al. developed a new tool for the identification of differentially expressed genes with RNA-Seq data, named GExposer. This tool introduced a local normalization algorithm to reduce the bias of nonran-domly positioned read depth. The Naive Bayes classifier was employed to integrate fold change, transcript length, and GC-content to identify differentially expressed genes. Results on several independent tests showed that GExposer had better performance than other methods. In the paper " K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data, " K. Wang et al. designed the nonlinear K-profiles clustering method, which can be seen as the nonlinear counterpart of the K-means clustering algorithm. The method had a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that K-profiles clustering outperformed traditional linear K-means algorithm. In addition, K-profile clustering generated biologically meaningful results from a gene expression dataset. Replicative senescence is of fundamental importance for the process of cellular aging. In the paper " Similarities in Gene Expression Profiles during In Vitro Aging of Primary Human Embryonic Lung and Foreskin Fibroblasts, " S. Diek-mann et al. elucidated cellular aging process by comparing gene expression changes, measured by RNA-Seq, in fibrob-lasts originating from two different tissues, embryonic lung (MRC-5) and foreskin (HFF), at five different time points during their transition into senescence. Their results showed that a number of monotonically up-and downregulated genes had a novel strong functional link to aging and senescence related processes. More and more studies have shown that many complex diseases are contributed jointly …
منابع مشابه
Reverse engineering biomolecular systems using -omic data: challenges, progress and opportunities
Recent advances in high-throughput biotechnologies have led to the rapid growing research interest in reverse engineering of biomolecular systems (REBMS). 'Data-driven' approaches, i.e. data mining, can be used to extract patterns from large volumes of biochemical data at molecular-level resolution while 'design-driven' approaches, i.e. systems modeling, can be used to simulate emergent system ...
متن کاملPractical Applications of Data Mining
Despite the undoubted influence, technologies have made a tremendous change in the field of bioinformatics and other related areas. Extensive research is still being carried out on fundamentals of data mining in genomics and proteomics addresses about the recent research developments which really depends on the analysis and interpretation of large amounts of data generated by high-throughput te...
متن کاملMining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM
Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...
متن کاملClassification of Information Fusion Methods in Systems Biology
Biological systems are extremely complex and often involve thousands of interacting components. Despite all efforts, many complex biological systems are still poorly understood. However, over the past few years high-throughput technologies have generated large amounts of biological data, now requiring advanced bioinformatic algorithms for interpretation into valuable biological information. Due...
متن کاملMathematical modeling of biological systems
Mathematical and computational models are increasingly used to help interpret biomedical data produced by high-throughput genomics and proteomics projects. The application of advanced computer models enabling the simulation of complex biological processes generates hypotheses and suggests experiments. Appropriately interfaced with biomedical databases, models are necessary for rapid access to, ...
متن کاملMinding, OLAPing, and Mining Biological Data: Towards a Data Warehousing Concept in Biology
The considerable "algorithmic complexity" of biological systems requires a huge amount of detailed information for their complete description. High-throughput experiments (e.g., microarrays) are generating an overwhelming amount of data of biological systems at the molecular and cellular level. To adequately organize, analyze, and interpret this deluge of information will require new computatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2015 شماره
صفحات -
تاریخ انتشار 2015